Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We propose a novel strategy for provenance tracing in random walk-based network diffusion algorithms, a problem that has been surprisingly overlooked in spite of the widespread use of diffusion algorithms in biological applications. Our path-based approach enables ranking paths by the magnitude of their contribution to each node’s score, offering insight into how information propagates through a network. Building on this capability, we introduce two quantitative measures: (i) path-based effective diffusion, which evaluates how well a diffusion algorithm leverages the full topology of a network, and (ii) diffusion betweenness, which quantifies a node’s importance in propagating scores. We applied our framework to SARS-CoV-2 protein interactors and human PPI networks. Provenance tracing of the Regularized Laplacian and Random Walk with Restart algorithms revealed that a substantial amount of a node’s score is contributed via multi-edge paths, demonstrating that diffusion algorithms exploit the non-local structure of the network. Analysis of diffusion betweenness identified proteins playing a critical role in score propagation; proteins with high diffusion betweenness are enriched with essential human genes and interactors of other viruses, supporting the biological interpretability of the metric. Finally, in a signaling network composed of causal interactions between human proteins, the top contributing paths showed strong overlap with COVID-19-related pathways. These results suggest that our path-based framework offers valuable insight into diffusion algorithms and can serve as a powerful tool for interpreting diffusion scores in a biologically meaningful context, complementing existing module- ornode-centric approaches in systems biology. The code is publicly available at https:// github.com/n-tasnina/provenance-tracing.git under the GNU General Public License v3.0.more » « lessFree, publicly-accessible full text available January 3, 2027
-
Motivation: Molecular interaction networks are powerful tools for studying cellular functions. Integrating diverse types of networks enhances performance in downstream tasks such as gene module detection and protein function prediction. The challenge lies in extracting meaningful protein feature representations due to varying levels of sparsity and noise across these heterogeneous networks. Results: We propose ICoN, a novel unsupervised graph neural network model that takes multiple protein–protein association networks as inputs and generates a feature representation for each protein that integrates the topological information from all the networks. A key contribution of ICoN is exploiting a mechanism called “co-attention” that enables cross-network communication during training. The model also incorporates a denoising training technique, introducing perturbations to each input network and training the model to reconstruct the original network from its corrupted version. Our experimental results demonstrate that ICoN surpasses individual networks across three downstream tasks: gene module detection, gene coannotation prediction, and protein function prediction. Compared to existing unsupervised network integration models, ICoN exhibits superior performance across the majority of downstream tasks and shows enhanced robustness against noise. This work introduces a promising approach for effectively integrating diverse protein–protein association networks, aiming to achieve a biologically meaningful representation of proteins. Availability and implementation: The ICoN software is available under the GNU Public License v3 at https://github.com/Murali-group/ICoN.more » « less
-
Abstract BackgroundNetwork propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. ResultsWe design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. ConclusionsWe examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.more » « less
-
Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 —human interactomenull (Ed.)Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.more » « less
An official website of the United States government

Full Text Available